Combining Global and Personal Anti-Spam Filtering

نویسنده

  • Richard Segal
چکیده

Many of the first successful applications of statistical learning to anti-spam filtering were personalized classifiers that were trained on an individual user’s spam and ham e-mail. Proponents of personalized filters argue that statistical text learning is effective because it can identify the unique aspects of each individual’s e-mail. On the other hand, a single classifier learned for a large population of users can leverage the data provided by each individual user across hundreds or even thousands of users. This paper investigates the trade-off between globallyand personallytrained anti-spam classifiers. We find that globally-trained text classification easily outperforms personally-trained classification under realistic settings. This result does not imply that personalization is not valuable. We show that the two techniques can be combined to produce a modest improvement in overall performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Autonomous Personal Filtering Improves Global Spam Filter Performance

Using two email streams, we show that a personal filter trained exclusively on user feedback substantially outperforms (p ≈ 0.000) three industry-leading global spam filters not using feedback. We show that autonomous personal filters, trained on the output from a global spam filter rather than user feedback, substantially outperform (p ≈ 0.000) the global filter, if by a somewhat smaller facto...

متن کامل

Stacking Classifiers for Anti-Spam Filtering of E-Mail

We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial email, or “spam”, floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the eff...

متن کامل

Survey on Spam Filtering Techniques

In the recent years spam became as a big problem of Internet and electronic communication. There developed a lot of techniques to fight them. In this paper the overview of existing e-mail spam filtering methods is given. The classification, evaluation, and comparison of traditional and learning-based methods are provided. Some personal anti-spam products are tested and compared. The statement f...

متن کامل

Evaluation of Anti-spam Method Combining Bayesian Filtering and Strong Challenge and Response

Recently, various schemes against spam are proposed because of rapid increasing of spam. Some schemes are based on sender whitelisting with auto registration, a principle that a recipient reads only messages from senders who are registered by the recipient, and a sender have to perform some procedure to be registered (challenge-response.) In these schemes, some exceptions are required to show e...

متن کامل

Variable Thresholding In Naïve Bayesian Spam Filters

Email has become an essential means of communication for both business and personal use. However, the proliferation of unwanted email advertising or spam has cost organizations millions of dollars and has reduced the effectiveness of email as a communications medium. Recently, spam filters have been widely adopted as a means of combating these unwanted messages. This paper presents a method for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007